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Abstract 

Background: Induced pluripotent stem cells (iPSCs) derived from somatic cells have enormous potential for clinical 
applications. Notably, it was recently reported that reprogramming from somatic cells to iPSCs can induce genomic 
copy number variation (CNV), which is one of the major genetic causes of human diseases. However it was unclear 
if this genome instability is dependent on reprogramming methods and/or the genetic background of donor cells. 
Furthermore, genome-wide CNV analysis is technically challenging and CNV data need to be interpreted with care. 

Results: In order to carefully investigate the possible CNV instability during somatic reprogramming, we performed 
genome-wide CNV analyses with 41 mouse iPSC lines generated from the same parental donor; therefore, the 
donor's genetic background can be controlled. Different reprogramming factor combinations and dosages were 
used for investigating potential method-dependent effects on genome integrity. We detected 63 iPSC CNVs using 
high-resolution comparative genomic hybridization. Intriguingly, CNV rates were negatively associated with the 
dosages of classic factor(s). Furthermore, the use of high-performance engineered factors led to less CNVs than the 
classic factor(s) of the same dosage. 

Conclusion: Our observations suggest that sufficient reprogramming force can protect the genome from CNV 
instability during the reprogramming process. 
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Background 

Induced pluripotent stem cells (iPSCs), which are derived 
from somatic cells through reprogramming via several 
methods [1], have enormous numbers of potential applica- 
tions, particularly in regenerative medicine, disease model- 
ing and drug screening [2], However, safe and effective 
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reprogramming methods remain to be described for produ- 
cing high-quality iPSCs [3]. Before developing personalized 
stem cell therapies, genome integrity and other safety con- 
cerns of iPSC technology must be addressed [4], particu- 
larly as genome stability can have profound effects on 
pluripotency, differentiation and the tumorigenicity of 
resulting iPSCs [5]. Notably, it was recently shown that the 
process of reprogramming somatic cells to iPSCs could in- 
duce genome alterations such as copy number variation 
(CNV) [6-8]. Current evidence suggests that these 
reprogramming-associated CNVs could be either de novo 
mutations or enriched mosaic variations in donor cells 
[9,10]. CNVs are one of the major genetic causes of human 
diseases [11]; therefore, it is imperative to carefully investi- 
gate possible CNV instability during somatic reprogram- 
ming before using iPSCs in a clinical or therapeutic setting. 
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Genome-wide CNV analysis is technically challenging 
and the CNV data need to be interpreted with caution 
[11,12]. To date, several genome technologies have been 
utilized for genomic CNV analysis [12]; however, the fol- 
lowing points need to be taken into consideration before 
confirming CNV instability in iPSCs. Firstly, technical limi- 
tations exist for identifying CNVs accurately, as the CNV 
calls of SNP microarrays (SNP for single nucleotide poly- 
morphism) are highly dependent on the external reference 
set used for the analysis [12]. The lack of internal refer- 
ences on SNP microarrays can lead to low signal-to-noise 
ratios in the process of CNV calling whilst CNV data 
obtained by comparative genomic hybridization (CGH) 
technology are more reliable [12]. Secondly, previous CNV 
calls in iPSCs cannot readily distinguish between 
reprogramming-associated CNVs (either de novo CNV or 
selected mosaic CNV) [10,13] and pre-existing germ-line 
CNVs in parental cells. Finally, it is still unknown whether 
the reported CNV instability is dependent on reprogram- 
ming methods or due to the genetic backgrounds of paren- 
tal cells [6,14], which can potentially cause method-specific 
or donor-specific genome instability. 

Considering the above concerns, we addressed the issue 
of CNV instability in iPSCs by taking into account the fol- 
lowing factors in our study design. Mouse iPSCs (miPSCs) 
were generated from the same parental donor to exclude 
the effect of the genetic background. Various combina- 
tions of reprogramming factors and dosages were used for 
CNV comparison between reprogramming methods. In 
addition, a high-density CGH microarray assay comparing 
miPSCs with their parental donor cells was used for 
genome-wide screening for the CNVs associated with cell 
reprogramming (Figure 1). Intriguingly, our observations 
revealed the dosage effect of pluripotent factors on gen- 
ome integrity during somatic reprogramming. 

Results 

Initially, we obtained 16 miPSC lines with the three 
"Yamanaka" factors (Oct4/Klf4/Sox2, OKS) [15,16] and sin- 
gle Oct4 [17] (Additional file 1): eight miPSC lines ob- 
tained by O_0.5, the other eight lines obtained by 
OKS_1.5. The dosage of each factor in these two methods 
was equivalent. Intriguingly, we identified 24 CNVs in 
eight miPSC lines of O_0.5 (i.e. 3.0 CNV/miPSC) and 
seven CNVs in eight lines of OKS_1.5 (i.e. approximately 
0.9 CNV/miPSC) (Additional file 2). The rates of miPSC 
CNVs between these two reprogramming methods are ob- 
viously different, suggesting that the strength of the iPSC 
reprogramming has an effect on genome integrity. Poten- 
tially this suggests that reduced diversity of reprogram- 
ming factors and/or reduced reprogramming dosages may 
induce more CNVs during somatic reprogramming. 

To further investigate the possible roles of reprogram- 
ming factor diversity and/or dosage in CNV instability, 



we generated another three sets of miPSC lines from the 
same donor but using different factor combinations and 
low/high (i.e., 0.5 ml/ 1.5 ml) dosages (see Methods for 
details) (Additional file 1): five miPSC lines obtained by 
0_1.5, ten lines obtained by OKS_0.5, and ten lines ob- 
tained by engineered factors XYZK_0.5 [18]. Using CGH 
assay we compared the resulting miPSC genomes with 
their parental genomes and identified zero CNVs in five 
miPSC lines of 0_1.5 (i.e. 0 CNV/miPSC), 25 CNVs in 
ten OKS_0.5 lines (i.e. 2.5 CNV/miPSC), and seven CNVs 
in ten XYZK_0.5 lines (i.e. 0.7 CNV/miPSC) (Additional 
file 2). In total, we screened the genomes of 41 miPSC 
lines and identified 63 CNVs across 24 genomic loci of the 
mouse genome (Figure 2 and Additional file 2). 

To investigate the potential mechanism involved in 
genome instability of miPSCs, we compared the CNV 
rates between the various methods using the same repro- 
gramming factor combinations with altered dosages. The 
Mann- Whitney U test with the exact significance was 
used for single-factor test and the ANOVA test was used 
for two-factor test. By comparing the CNV numbers be- 
tween O_0.5 and 0_1.5 miPSC lines, we observed more 
CNVs in O_0.5 miPSCs (24 CNVs/8 miPSCs) than in 
0_1.5 miPSCs (0 CNV/5 miPSCs). This difference is sta- 
tistically significant (p-value = 0.030, Mann- Whitney U 
test) (Figure 3A). Similarly comparing the number of 
CNVs in OKS0.5 (25 CNVs/10 miPSCs) and OKS _L5 
miPSC lines (7 CNVs/8 miPSCs), although not significant 
(p-value = 0.146, Mann- Whitney U test; Figure 3B) does 
still suggest that a low dosage of reprogramming factors 
may induce more CNVs than a high dosage. We also com- 
bined the CNV data in Figure 3A and 3B together based 
on their dosages. In the low dosage group (0.5 ml), 49 
CNVs were detected in 18 miPSC lines (i.e. approximately 
2.7 CNV/miPSC); while in the high dosage group (1.5 ml), 
eight CNVs were detected in 13 miPSC lines (i.e. ap- 
proximately 0.6 CNV/miPSC). A significant difference 
in CNV rates was observed (p-value = 0.008, ANOVA 
test) (Figure 3C), which strongly supports that the dose 
of reprograming factors and consequently the repro- 
gramming force can significantly affect the genome in- 
stability during reprogramming, with higher doses and 
stronger reprogramming providing a protective effect. 
Notably, recent studies have reported that reprogram- 
ming factor dosage can affect the epigenetic properties 
of iPSCs [3] and increased levels of Oct4 and Klf4 were 
observed to give rise to high-quality iPSCs [19,20]. 

To further explore the roles of reprogramming force in 
CNV instability of miPSCs, we compared CNV rates be- 
tween diverse factor-combinations, while their total dosages 
remained the same. We introduced the engineered factors 
XYZK (Oct4-VP16, Sox2-VP16, Klf4 and Nanog-VP16) due 
to their strong promoting capability during reprogramming 
[18]. The CNV rates are: O_0.5 (24 CNVs/8 miPSCs), 
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Figure 1 The experimental flow of genome-wide CNV analyses in miPSCs using high-density CGH microarrays. A total of 41 miPSC lines 
were derived from the same donor of MEF B2 using different reprogramming factors and/or dosages. Genomic DNAs of the parental MEF B2 and 
various miPSC lines were respectively extracted and fragmented. Cy5-dUTP was used for labeling miPSC DNAs and Cy3-dUTP for the donor DNAs. 
Each labeled miPSC DNA was hybridized together with a labeled donor DNA onto the mouse genome CGH microarrays. Microarray handling and 
data analysis were conducted following the Agilent oligonucleotide CGH protocol. Examples were shown for a copy number loss (deletion; 
depicted by green concave) and a copy number gain (duplication; depicted by red convex) at the bottom. 



OKS0.5 (25 CNVs/10 miPSCs) and XYZK0.5 (7 CNVs/ 
10 miPSCs) (Additional file 2). There is no significant dif- 
ference between O_0.5 and OKS_0.5 (p-value = 0.633), 
however, the CNVs of XYZK_0.5 are significantly less 
than those in O_0.5 (p-value = 0.043) and those in 
OKS_0.5 (p-value = 0.023) (Figure 3D). Particularly im- 
portant was the observation that all the seven CNVs de- 
tected in XYZK_0.5 came from just two of the ten iPSC 



lines and six of those seven CNVs were in a single iPSC 
line. The remaining eight iPSC lines of XYZK_0.5 ad 
zero CNVs (Additional file 2). This suggests that high- 
performance engineered factors XYZK are likely to help 
maintain the genome integrity by reducing reprogram- 
ming barriers. Consistently, these observations also sup- 
port that sufficient reprogramming force has a positive 
role in iPSC genome integrity. 
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Figure 2 The genome distribution of 24 loci of the miPSC CNVs identified in this study was shown. The green bars indicate the deletion 
loci, and the red bars represent duplication loci. 



Discussion 

Mouse iPSCs were first generated by retroviral trans- 
duction of four transcription factors: Oct3/4, Sox2, Klf4 
and c-Myc [1]. However, reactivation of c-Myc increases 
tumorigenicity in the chimeras, hindering clinical appli- 
cations [21]. It was observed that the mice derived from 
c-Myc-free iPSCs showed a significantly reduced inci- 
dence of tumorigenicity compared with those derived by 
the four classic factors [16]. For the sake of high-quality 
iPSCs generation, we excluded c-Myc in our study de- 
sign. Considering the low-efficiency of iPSCs induction 
without c-Myc, we also utilized the optimized repro- 
gramming culture conditions with ultra-high efficiency 
on iPSCs generation [15]. 

Based on the reliable CGH technology, we found that 
reprogramming factor dosage is negatively associated with 
CNV rate. This result showed the possibility that sufficient 



reprogramming force may help maintain genome integrity 
during somatic reprogramming. 

Since the reprogramming process is an artificial process 
that reverses the somatic cell fate into a pluripotent state, 
reprogramming faces various epigenetic barriers that were 
set during normal differentiation [3]. Previous evidence 
showed that the reprogramming process can broadly be 
divided into two phases: a long stochastic phase of gene 
activation and a shorter, hierarchical, more deterministic 
phase of gene activation [3]. The stochastic nature of the 
reprogramming process suggests that not genetic but epi- 
genetic barriers can be seen as roadblocks in the journey 
to pluripotency [22] . Reprogramming factors initiate tran- 
scriptional effect as well as epigenetic regulation to help 
re-establishing pluripotency [23,24], Moreover, some regu- 
lators or chemicals, such as Jhdmlb, valporic acid and 
vitamin C, can overcome these epigenetic barriers and so 
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Figure 3 Comparison of CNV rates between the miPSCs induced by different reprogramming methods. (A) The comparison of CNV 
numbers between O_0.5 and 0_1.5 miPSC lines. This difference is statistically significant (p-value = 0.030, Mann-Whitney U test with the exact 
significance). (B) The comparison of CNV numbers between OKS_0.5 and OKS_1.5 miPSC lines (Mann-Whitney U test with the exact significance). 
(C) The combinational analysis of (A) and (B). A significant difference of CNV numbers between high and low reprogramming factor dosages 
was found (p-value = 0.008, ANOVA test for two-factor analysis). (D) The comparison of CNV numbers between the methods using diverse 
reprogramming factors with balanced dosages (Mann-Whitney U test with the exact significance). The CNVs are significantly less in the miPSCs 
of XYZK_0.5 than in those of O_0.5 (p-value = 0.043) and OKS_0.5 (p-value = 0.023). 



markedly enhance reprogramming [3,25,26]. These ob- 
servations suggest that the strength of reprogramming 
targeting epigenetic barriers is important for successful 
reprogramming. On the other hand, the iPSCs derived 
from the stochastic reprogramming phase represent the 
cells experiencing greater epigenetic changes from the 
somatic state to a pluripotent one, which could be recog- 
nized as a kind of pressure. CNV instability investigated in 
this study may serve as pressure-induced factors that take 
part in overcoming epigenetic roadblocks. Therefore, we 
suggest that iPSCs might experience more genome in- 
stability during the reprogramming process if the strength 
of reprogramming is not enough. Conversely, sufficient re- 
programming force will lead to much fewer CNVs. Never- 
theless, this hypothesis should be investigated further. 

In total we performed genome-wide CNV analyses on 
41 miPSC lines derived by different reprogramming fac- 
tors and/or dosages and detected 63 miPSC CNVs. The 



average CNV rate is approximately 1.5 per miPSC line, 
which suggests that the CNVs associated with cell repro- 
gramming is not frequent. The choices of appropriate 
reprogramming methods with sufficient reprogramming 
force are likely to help maintain genome integrity of 
iPSCs. 

Conclusions 

In summary, we showed, using the CGH microarray assay 
to directly compare the CNV status of miPSCs to their 
parental cells is reliable to identify CNV alterations associ- 
ated with cell reprogramming. Based on the genome-wide 
analyses of 41 miPSC lines derived by different methods, 
we suggest that increasing factor dosages, or using high- 
performance engineered factors [18], is beneficial for the 
genome integrity of the resulting miPSCs. Our observa- 
tions highlight the importance of further investigations on 
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the mechanisms and kinetics of cell reprogramming and 
their effects on iPSC genome integrity. 

Methods 

Mouse iPSCs generated from the same donor 

An embryonic fibroblast cell line (MEF B2) derived from 
the OG2 mouse was used as the parental donor. The 
donor cells were infected with retroviruses carrying the 
indicated reprogramming factors for two days, and then 
were cultured in iCDl medium for the generation of 
iPSCs [15]. We normalized the virus with equal titer 
(low dosage, MOI = 15 when 0.5 ml virus was used; high 
dosage, MOI = 45 when 1.5 ml virus was used) according 
to the titer detecting by Takara Retrovirus Titer Set. In 
total, we obtained eight miPSC lines using single-factor 
Oct4 (0.5 ml Oct4, i.e. O_0.5), five lines using single- 
factor Oct4 (1.5 ml Oct4, i.e. 0_1.5), ten lines using 
three-factor combination (0.167 ml Oct4, 0.167 ml Klf4, 
and 0.167 ml Sox2, i.e. OKS_0.5), eight lines using three- 
factor combination (0.5 ml Oct4, 0.5 ml Klf4, and 0.5 ml 
Sox2, i.e. OKS_1.5), and ten lines obtained by previously 
reported engineered factors (0.125 ml Oct4-VP16 (X), 
0.125 ml Sox2-VP16 (Y), 0.125 ml Nanog-VP16 (Z) and 
0.125 ml Klf4, i.e. XYZK_0.5) [18]. The reprogramming 
efficiencies of different factor combinations were de- 
scribed in previous studies [15,17,18]. The iPSC colonies 
were picked based on Oct4-GFP expression and were 
validated with a normal karyotype. All of the 41 miPSC 
lines were harvested at passage 4 for further analysis. All 
the miPSCs were maintained in mES2i medium, i.e. 
DMEM supplemented with 15% (v/v) fetal bovine serum, 
glutamine, non-necessary amino acid, lOOOU/ml LIF, 
1 uM PD0325901 and 3 uM Chir99021. Our experiments 
performed with animals were approved by the relevant in- 
stitutional animal care and use committee (IACUC) of 
Guangzhou Institutes of Biomedicine and Health (GIBH). 

High-resolution assay of comparative genomic 
hybridization microarray 

Genomic DNAs extracted from each miPSC line and the 
parental donor (MEF B2) were fragmented using Alul and 
Rsal enzyme digestion. DNA labeling was conducted using 
Agilent SureTag DNA Labeling Kit. Different fluorescence 
dyes were used for DNA labeling of miPSCs (Cy5-dUTP) 
and the donor parental cell line (Cy3-dUTP). Each labeled 
miPSC DNA was hybridized together with the labeled 
donor DNA onto Agilent SurePrint G3 mouse 1 x 1 M 
microarray for 40 hours at 65°C. DNA processing, micro- 
array handling and scanning were conducted following the 
Agilent oligonucleotide CGH protocol (version 6.0). 

Genome-wide CNV analyses 

The microarray scanning profiles were processed by 
Agilent Feature Extraction 10.7.3.1. The extracted data 



was analyzed and plotted by Agilent Workbench 7.0. 
ADM-2 was selected as statistical algorithm with the 
threshold of 6.0 and the Fuzzy Zero turning on. Each 
CNV was called by at least four consecutive probes with 
log 2 Ratio (fluorescence value ratio of miPSC-associated 
Cy5 to donor-associated Cy3) consistent with deletion 
or duplication. 

Statistical analysis 

The Mann- Whitney U test with the exact significance was 
used to determine statistically significant differences in 
miPSC CNVs between different reprogramming methods. 
The ANOVA test was used in Figure 3C when a two-factor 
test is needed. Differences were considered statistically sig- 
nificant when p-value < 0.05. 

Additional files 



Additional file 1: Sample information of mouse iPSC lines. 
Additional file 2: The CNVs identified in mouse iPSC linse. 
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